Materials for “ Riemannian Pursuit for Big Matrix Recovery ”

نویسندگان

Mingkui Tan

Ivor W. Tsang

Li Wang

چکیده

In this supplementary file, we first present the parameter setting for ρ, and then present the proof of the lemmas and theorems appeared in the main paper. 1. Parameter Setting for ρ In the main paper, we present (14) as a simple and effective method to choose ρ, which is motivated by the thresholding strategy in StOMP for sparse signal recovery (Donoho et al., 2012). Specifically, let σ be the singular vector of A∗(b), where σi is arranged in descending order, we choose ρ such that σi ≥ ησ1, ∀i ≤ ρ, (1) where η ≥ 0.60 is usually a good choice. However, it is not trivial to predict the number of singular values that satisfy (1) for big matrices if we do not want to compute a full SVD. Since ρ in general is small, we propose to compute σi sequentially until condition (1) is violated. Let B ≥ 1 be a small integer. We propose to compute B singular values per iteration. Basically, if σi > ησ1 (where i ≥ 2), we can compute the singular values σi+1, ..., σi+B by performing a rank B truncated SVD on Ai = A∗(b)− ∑i j=1 σjujv T j using PROPACK. In practice, we suggest setting B ≥ 2. The schematic of the Sequential Truncated SVD for Setting ρ is presented in Algorithm 1. Notice that, PROPACK involves only matrix-vector product with Ai and Ai which can be calculated as Udiag(σ)Vr by Udiag(σ)(Vr) for the low-rank term in Ai. We remark that instead of Algorithm 1, a more efficient technique may involve restarting the Krylov-based method, like PROPACK, with an increasingly larger subspace until (1) is satisfied. Algorithm 1 Sequential Truncated SVD for Setting ρ. 1: Given η and A∗(b), initialize ρ = 2 and B > 1. 2: Do the rank-2 truncated SVD on A∗(b), obtaining σ ∈ R, U ∈ Rm×2 and V ∈ Rn×2. 3: If σρ ≥ ησ1, stop and return ρ = 2. 4: while σρ < ησ1 do 5: Let ρ = ρ+B. 6: Do the rank-B truncated SVD on A∗(b)−Udiag(σ)VT, obtaining σ ∈ R , UB ∈ Rm×B and VB ∈ Rn×B . 7: Let U = [UUB ], V = [V VB ] and σ = [σ σ ]. 8: end while 9: Return ρ. Proceedings of the 31 st International Conference on Machine Learning, Beijing, China, 2014. JMLR: W&CP volume 32. Copyright 2014 by the author(s). Riemannian Pursuit for Big Matrix Recovery 2. Main Theoretical Results in the Paper We first repeat the main results in the paper before we prove Lemma 1 and Theorem 1 (the other results were already proven in the paper). Proposition 1. In MC, suppose the observed entry set Ξ is sampled according to the Bernoulli model with each entry (i, j) ∈ Ξ being independently drawn from a probability p. There exists a constant C > 0, for all γr ∈ (0, 1), μB ≥ 1, n ≥ m ≥ 3, if p ≥ CμBr log(n)/(γ rm), the following RIP condition holds (1− γr)p||X||F ≤ ||PΞ(X)||F ≤ (1 + γr)p||X||F , (2) for any μB-incoherent matrix X ∈ Rm×n of rank at most r with probability at least 1− exp(−n log n). Lemma 1. Let {X} be the sequence generated by RP, then f(X) ≤ f(X)− τt 2 ||H2||2. (3) where τt satisfies condition in (10) of the paper. Theorem 1. Let {X} be the sequence generated by RP and ζ = min{τ1, · · · , τι}. As long as f(X) ≥ C2 ||e|| 2 (where C > 1) and if there exists an integer ι > 0 such that γ(r̂+2ιρ) < 12 , then RP decreases linearly in objective values when t < ι, namely f(X) ≤ νf(X), where ν = 1− ρζ 2r̂ ( C(1− 2γ(r̂+2ιρ)) ( √ C + 1)2(1− γ(r̂+2ιρ)) )( 1− 1 √ C )2 . Proposition 2 (Sato & Iwai (2013)). Given the retraction (8) and vector transport (20) onMr in the paper, there exists a step size θk that satisfies the strong Wolfe conditions (17) and (18) of the paper. Lemma 2. If c2 < 12 , then the search direction ζk generated by NRCG with Fletcher-Reeves rule and strong Wolfe step size control satisfies − 1 1− c2 ≤ 〈gradf(Xk), ζk〉〈gradf(Xk−1), gradf(Xk−1)〉 ≤ 2c2 − 1 1− c2 . (4) Theorem 2. Let {Xk} be the sequence generated by NRCG with the strong Wolfe line search, where 0 < c1 < c2 < 1/2, we have limk→∞ inf gradf(Xk) = 0. 3. Proof of Lemma 1 in the Paper The step size τt is determined such that f(RX(−τtH)) ≤ f(Xt−1)− τt 2 〈Ht−1,Ht−1〉. Since 〈Ht−1 1 ,H t−1 2 〉 = 0, it follows that f(X) ≤ f(Xt−1)− τt 2 ||Ht−1 1 ||2 − τt 2 ||Ht−1 2 ||2 ≤ f(Xt−1)− τt 2 ||Ht−1 2 ||2, where the equality holds when Ht−1 1 = 0, which happens if we solve the master problem exactly. 4. Proof of Theorem 1 in the Paper 4.1. Key Lemmas To complete the proof of Theorem 1, we first recall a property of the orthogonal projection PTXMr (Z). Lemma 3. Given the orthogonal projection PTXMr (Z) = PUZPV + P⊥ U ZPV + PUZP V , we have rank(PTXMr (Z)) ≤ 2 min(rank(Z), rank(PU )) for any X. In addition, we have ||PTXMr (Z)||F ≤ ||Z||F for any Z ∈ Rm×n. Riemannian Pursuit for Big Matrix Recovery Proof. According to (Shalit et al., 2012), the three terms in PTXMr (Z) are orthogonal to each other. Since rank(PU ) = rank(PV ), we have rank(PTXMr (Z)) = rank(ZPV + PUZP ⊥ V ) ≤ min(rank(Z), rank(PV )) + min(rank(Z), rank(PU )) = 2 min(rank(Z), rank(PU )). The relation ||PTXMr (Z)||F ≤ ||Z||F follows immediately form the fact that PTXMr is an orthogonal projection for the Frobenius norm. 4.2. Notation We first introduce some notation. First, let X̂ and e be the ground-truth low-rank matrix and additive noise, respectively. Moreover, let {X} be the sequence generated by RP, ξ = A(X) − b and G = A∗(ξ). In RP, we solve the fixed-rank subproblem approximately by the NRCG method. Recall the definition of the orthogonal projection onto the tangent space of X = USV PTXMr (Z) := PTX(Z) = PUZPV + P⊥ U ZPV + PUZP V = PUZ + ZPV − PUZPV , where PU = UU and PV = VV. In addition, denote the projection P⊥ TX as the complement of PTX as P⊥ TX = (I− PU )Z(I− PV ). Now recalling Et = PTXtMtρ(G ) = PTXtMtρ(A ∗(ξ)), we have 〈Xt,A∗(ξ)〉 = 〈X,Et〉. (5) At the tth iteration, rank(X) = tρ, thus X − X̂ is at most of rank (r̂ + tρ) where r̂ = rank(X̂). By the orthogonal projection PTX , we decompose X̂ into two mutually orthogonal matrices X̂ = X̂Ct + X̂Qt , where X̂Ct = PTXt (X̂) and X̂Qt = P ⊥ TXt (X̂). (6) Based on the above decomposition, it follows that 〈X − X̂Ct , X̂Qt〉 = 0. Without loss of generality, we assume that r̂ ≥ tρ. According to Lemma 3, we have rank(X̂Ct) ≤ 2 min(rank(X̂), rank(PU )) = 2tρ and rank(X̂Qt) ≤ r̂. Moreover, since the column and row space of X is contained in X̂Ct = PTXt (X̂), we have rank(X t − PTX(X̂)) ≤ 2tρ. Note that, at the (t+ 1)th iteration of RP, we increase the rank of X by ρ by performing a line search using H = H1 + H t 2, where H t 1 = PTXt (G ) and H2 = Uρdiag(σρ)V T ρ ∈ RanP⊥ TXt . For convenience in description, we define an internal variable Z ∈ Rm×n as: Z = X − τtG, (7) where τt is the step size used in (10) in the paper. Based on the decomposition of H, we decompose Z into Z = Z 1 + Z t+1 Q + Z t+1 R , (8) Riemannian Pursuit for Big Matrix Recovery where Z 1 = PTXt (Z ) = X − τtPTXt (G ), Z Q = TX̂Qt (Z) = tTX̂Qt (G), (9) Z R = (I − PTXt − TX̂Qt )(Z). Observe that from (6), we have X̂TQtX t = (X)X̂Qt = 0. Hence, RanPTXt ⊥ TX̂Qt ⊥ Ran(I − PTXt − TX̂Qt ), (10) which implies that the three matrices from above are mutually orthogonal. Similarly, G is decomposed into three mutually orthogonal parts G = G1 + G t Q + G t R, where G t 1 = PTXt (G ), GQ = TX̂Qt (G), and GR = (I − PTXt − TX̂Qt )(G). (11) 4.3. Proof of Theorem 1 The proof of Theorem 1 involves three bounds for f(X) in terms of X̂Qt , Z t+1 Q and ||H2||F , respectively. For convenience, we first list these bounds in order to complete the proof of Theorem 1, and we will leave the detailed proof of the three bounds in Section 4.4. First, the following Lemma gives the bound of f(X) in terms of X̂Qt . Lemma 4. At the t-th iteration, if γ(r̂+2tρ) < 1/2, then f(X) ≥ 1 2 C(1− 2γ(r̂+2tρ)) ( √ C + 1)2(1− γ(r̂+2tρ)) ||X̂Qt ||F . The following lemma bounds f(X) w.r.t. Z Q . Lemma 5. Suppose ||Et||F is sufficiently small with Et = PTXt (G ). For γ(r̂+2tρ) < 1/2 and C > 1, we have ‖Z Q ‖ 2 F ≥ ( 2Cτ t (1− 2γ(r̂+2tρ)) ( √ C + 1)2(1− γ(r̂+2tρ)) )( 1− 1 √ C )2 f(X), By combining Lemma 4 and 5 from above, we shall show the following bound for f(X) w.r.t. H2. Lemma 6. If γ(r̂+2tρ) < 12 , at the t-th iteration, we have ‖H2‖F > ρ r̂ ( C(1− 2γ(r̂+2tρ)) ( √ C + 1)2(1− γ(r̂+2tρ)) ) (1− 1 √ C )f(X). Proof of Theorem 1. By combining Lemma 1 and Lemma 6, we have f(X) ≤ f(X)− τt 2 ‖H2‖ ≤ ( 1− ρτt 2r̂ ( C(1− 2γ(r̂+2tρ)) ( √ C + 1)2(1− γ(r̂+2tρ)) )( 1− 1 √ C )2) f(X). The variable τt is a step size obtained by the line search. There should exist a ζ and ζ = min{τ1, · · · , τι} such that the above relation holds for each t < ι, where γ(r̂+2ιρ) < 1/2. Note that (1−2γ(r̂+2tρ)) (1−γ(r̂+2tρ)) is decreasing w.r.t. γ(r̂+2tρ) in (0, 1/2). In addition, since γ(r̂+2tρ) ≤ γ(r̂+2ιρ) holds for all t ≤ ι, the following relation holds if γ(r̂+2ιρ) < 1/2 and t < ι, f(X) ≤ ( 1− ρζ 2r̂ ( C(1− 2γ(r̂+2ιρ)) ( √ C + 1)2(1− γ(r̂+2ιρ)) )( 1− 1 √ C )2) f(X). This completes the proof of Theorem 1. Riemannian Pursuit for Big Matrix Recovery 4.4. Detailed Proof of the Three Bounds 4.4.1. KEY LEMMAS To proceed, we need to recall a property of the constant γr in RIP. Lemma 7. (Lemma 3.3 in (Candès & Plan, 2010)) For all XP , XQ ∈ Rm×n satisfying 〈XP ,XQ〉 = 0, where rank(XP ) ≤ rp, rank(XQ) ≤ rq , |〈A(XP ),A(XQ)〉| ≤ γrp+rq‖XP ‖F ‖XQ‖F . (12) In addition, for any two integers r ≤ s, then γr ≤ γs (Dai & Milenkovic, 2009). Suppose bq = A(XQ), for some XQ with rank(XQ) = rq . Define bp = A(XP ) where XP is the optimal solution of the following problem min X 1 2 ||A(X)−A(XQ)||F , s.t. rank(X) = rp, 〈X,XQ〉 = 0. (13) Let br = bp − bq = A(XP )−A(XQ), then the following relation holds. Lemma 8. With bq , bp and br defined above, if γmax(rp,rq) + γrp+rq ≤ 1, then ‖bp‖ ≤ γrp+rq 1− γmax(rp,rq) ‖bq‖, and (14) ( 1− γrp+rq 1− γmax(rp,rq) ) ‖bq‖ ≤ ‖br‖ ≤ ‖bq‖. (15) Proof. Since 〈XQ,XP 〉 = 0, with Lemma 7, we have |bpbq| = |〈A(XP ),A(XQ)〉| ≤ γrp+rq ||XP ||F ||XQ||F ≤ γrp+rq ||bp|| √ 1− γrp ||bq|| √ 1− γrq ≤ γrp+rq 1− γmax(rp,rq) ||bp|| ||bq||. Now we show that bpbr = 0. Let XP = Udiag(σ)V . Since XP is the minimizer of (13), σ is also the minimizer to the following problem: min σ ‖bq −Dσ‖2, (16) where D = [A(u1v 1 ), ...,A(urpv rp)]. The Galerkin condition for this linear least-square system states that b (Dσ − bq) = 0 for any b in the column span of D. Since bp = A(XP ) is included in the span of D, we obtain bpbr = 0. Recall now br = bp − bq . Then it follows that |bpbq| = |bp (bp − br)| = ||bp||. Accordingly, we have ||bp|| ≤ γrp+rq 1− γmax(rp,rq) ||bq||. Using the reverse triangular inequality, ||br|| = ||bq − bp|| ≥ | ||bq|| − ||bp|| |, we obtain ||br|| ≥ ( 1− γrp+rq 1− γmax(rp,rq) ) ||bq||. By the Galerkin condition of (16), we have ||bq|| = ||br|| + ||bp||, we obtain ( 1− γrp+rq 1− γmax(rp,rq) ) ||bq|| ≤ ||br|| ≤ ||bq||. Finally, the condition γmax(rp,rq) +γrp+rq ≤ 1 is for the positiveness of (1− γrp+rq 1−γmax(rp,rq) ). This completes the proof. Riemannian Pursuit for Big Matrix Recovery 4.4.2. PROOF OF LEMMA 4 Recall (6). Since b = A(X̂) + e, we have √ f(Xt) = √ 1 2 ‖A(Xt)− b‖2 = 1 √ 2 ‖A(X − X̂)− e‖ ≥ 1 √ 2 ( ‖A(X − X̂)‖ − ‖e‖ ) = 1 √ 2 ( ‖A(X − X̂Ct)−A(X̂Qt)‖ − ||e|| ) Note that 〈X − X̂Ct , X̂Qt〉 = 0, and rank(X − X̂Ct) ≤ 2tρ. By applying Lemma 8, where we let bq = A(X̂Qt) and br = A(X̂Qt)−A(XP ) with XP specified below, it follows that √ f(Xt) ≥ 1 √ 2 ( min rank(X)=2tρ,〈X,X̂Qt 〉=0 ‖A(X)−A(X̂Qt)‖ − ‖e‖ ) ≥ 1 √ 2 (( 1− γr̂+2tρ 1− γmax(2tρ,r̂) ) ‖A(X̂Qt)‖ − ‖e‖ ) (by Lemma 8) ≥ 1 √ 2 (( 1− γr̂+2tρ 1− γmax(2tρ,r̂) )√ 1− γr̂||X̂Qt ||F − ||e|| ) (by RIP condition) ≥ 1 √ 2 (( 1− γr̂+2tρ 1− γr̂+2tρ )√ 1− γr̂+2tρ||X̂Qt ||F − ||e|| ) (by γ(r̂+2tρ) ≥ γmax(2tρ,r̂) ≥ γr̂) ≥ 1 √ 2 ( 1− 2γ(r̂+2tρ) √ (1− γ(r̂+2tρ)) ||X̂Qt ||F − ||e|| )

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Riemannian Pursuit for Big Matrix Recovery

Low rank matrix recovery is a fundamental task in many real-world applications. The performance of existing methods, however, deteriorates significantly when applied to ill-conditioned or large-scale matrices. In this paper, we therefore propose an efficient method, called Riemannian Pursuit (RP), that aims to address these two problems simultaneously. Our method consists of a sequence of fixed...

متن کامل

Scalable Nuclear-norm Minimization by Subspace Pursuit Proximal Riemannian Gradient

Trace-norm regularization plays a vital role in many learning tasks, such as low-rank matrix recovery (MR), and low-rank representation (LRR). Solving this problem directly can be computationally expensive due to the unknown rank of variables or large-rank singular value decompositions (SVDs). To address this, we propose a proximal Riemannian gradient (PRG) scheme which can efficiently solve tr...

متن کامل

A Geometry Preserving Kernel over Riemannian Manifolds

Abstract- Kernel trick and projection to tangent spaces are two choices for linearizing the data points lying on Riemannian manifolds. These approaches are used to provide the prerequisites for applying standard machine learning methods on Riemannian manifolds. Classical kernels implicitly project data to high dimensional feature space without considering the intrinsic geometry of data points. ...

متن کامل

Cross Low-Dimension Pursuit for Sparse Signal Recovery from Incomplete Measurements Based on Permuted Block Diagonal Matrix

In this paper, a novel algorithm, Cross Low-dimension Pursuit, based on a new structured sparse matrix, Permuted Block Diagonal (PBD) matrix, is proposed in order to recover sparse signals from incomplete linear measurements. The main idea of the proposed method is using the PBD matrix to convert a high-dimension sparse recovery problem into two (or more) groups of highly low-dimension problems...

متن کامل

PMU-Based Matching Pursuit Method for Black-Box Modeling of Synchronous Generator

This paper presents the application of the matching pursuit method to model synchronous generator. This method is useful for online analysis. In the proposed method, the field voltage is considered as input signal, while the terminal voltage and active power of the generator are output signals. Usually, the difference equation with a second degree polynomial structure is used to estimate the co...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2014

Materials for “ Riemannian Pursuit for Big Matrix Recovery ”

نویسندگان

چکیده

منابع مشابه

Riemannian Pursuit for Big Matrix Recovery

Scalable Nuclear-norm Minimization by Subspace Pursuit Proximal Riemannian Gradient

A Geometry Preserving Kernel over Riemannian Manifolds

Cross Low-Dimension Pursuit for Sparse Signal Recovery from Incomplete Measurements Based on Permuted Block Diagonal Matrix

PMU-Based Matching Pursuit Method for Black-Box Modeling of Synchronous Generator

عنوان ژورنال:

اشتراک گذاری